42 research outputs found

    Alphabet indexing for approximating features of symbols

    Get PDF
    AbstractWe consider two maximization problems to find a mapping from a large alphabet forming given two sets of strings to a set of a very few symbols specifying a symbol wise transformation of strings. First we show that the problem to find a mapping that transforms the most of the strings as to form disjoint sets cannot be approximated within a ratio n116 in polynomial time, unless P = NP. Next we consider a mapping that retains the difference of the maximum number of pairs of strings over the given sets. We present a polynomial-time approximation algorithm that guarantees a ratio k(k − 1) for mappings to k symbols, as well as proving that the problem is hard to approximate within an arbitrary small ratio in polynomial time. Furthermore, we extend this algorithm as to deal with not only pairs but also tuples of strings and show that it achieves a constant approximation ratio

    Application of Approximate Pattern Matching in Two Dimensional Spaces to Grid Layout for Biochemical Network Maps

    Get PDF
    Background For visualizing large-scale biochemical network maps, it is important to calculate the coordinates of molecular nodes quickly and to enhance the understanding or traceability of them. The grid layout is effective in drawing compact, orderly, balanced network maps with node label spaces, but existing grid layout algorithms often require a high computational cost because they have to consider complicated positional constraints through the entire optimization process. Results We propose a hybrid grid layout algorithm that consists of a non-grid, fast layout (preprocessor) algorithm and an approximate pattern matching algorithm that distributes the resultant preprocessed nodes on square grid points. To demonstrate the feasibility of the hybrid layout algorithm, it is characterized in terms of the calculation time, numbers of edge-edge and node-edge crossings, relative edge lengths, and F-measures. The proposed algorithm achieves outstanding performances compared with other existing grid layouts. Conclusions Use of an approximate pattern matching algorithm quickly redistributes the laid-out nodes by fast, non-grid algorithms on the square grid points, while preserving the topological relationships among the nodes. The proposed algorithm is a novel use of the pattern matching, thereby providing a breakthrough for grid layout. This application program can be freely downloaded from http://www.cadlive.jp/hybridlayout/hybridlayout.html

    An Approximation Algorithm for Alphabet Indexing Problem

    No full text
    Alphabet Indexing is the problem to find a mapping f : 6 ! f1; : : : ; Kg for alphabet 6, positive integer K and a pair of disjoint sets of strings P; Q ` 6 3 such that f transforms no two strings from P and Q into identical ones. Although Alphabet Indexing problem is NP-complete, we define a combinatorial optimization problem, Max K-Indexing, and propose a simple greedy algorithm for this problem. Then we show that the algorithm achieves the constant error ratio 1=K for K-indexing with respect to the number of distinguishable pairs and our problem is MAX SNP-hard. Keywords: alphabet indexing, approximation algorithm, MAX-SNP, combinatorial optimization 1 Introduction Given a pair of disjoint sets of strings P and Q over alphabet 6 and positive integer K , Alphabet Indexing is the problem to find a mapping f from 6 to 0 = f1; : : : ; Kg such that f transforms no two strings drawn from P and Q into identical ones in 0 3 . Alphabet Indexing is named after "Hydropathy Index" for a..

    Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words

    No full text
    We study the efficient discovery of word-association patterns, defined by a sequence of strings and a proximity gap between them, from a collection of texts with binary labels. We present an algorithm that finds all d strings and k proximity word-association patterns that maximizes agreement with the labels. It runs in expected time complexity O(k d01 n log d+1 n) and O(k d01 n) space with the total length n of texts, if texts are uniformly random strings. We also show that finding a best word-association pattern with arbitrarily many strings is intractable and hard to approximate within a factor arbitrary close to one, i.e., has no polynomial-time approximation scheme unless P=NP. 1 Introduction Data mining emerged in early 1990's [1] aims to devise semi-automatic tools for discovering valuable "association rules" from facts stored in large scale databases. An association rule is an implication between two conditions, namely a presumptive condition and the objective condition [..

    A Space-Saving Approximation Algorithm for Grammar-Based Compression

    Get PDF
    A space-efficient approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. For the input length n and an optimum CFG size g, the algorithm consumes only O(g log g) space and O(n log*n) time to achieve O((log*n)log n) approximation ratio to the optimum compression, where log*n is the maximum number of logarithms satisfying log log…log n > 1. This ratio is thus regarded to almost O(log n), which is the currently best approximation ratio. While g depends on the string, it is known that g =Ω(log n) and g=Ω(logn)g=\Omega(\log n) and g=O(nlogkn)g=O\left(\frac{n}{log_kn}\right) for strings from k-letter alphabet[12]

    Complexity of Finding Alphabet Indexing

    No full text
    For two finite disjoint sets P and Q of strings over an alphabet 6, an alphabet indexing / for P; Q by an indexing alphabet 0 with j0j ! j6j is a mapping / : 6 ! 0 satisfying ~ /(P ) " ~ /(Q) = ;, where ~ / : 6 3 ! 0 3 is the homomorphism derived from /. We defined this notion through experiments of knowledge acquisition from amino acid sequences of proteins by learning algorithms. This paper analyzes the complexity of finding an alphabet indexing. We first show that the problem is NP-complete. Then we give a local search algorithm for this problem and show a result on PLS-completeness. Key words: Algorithm and Computational Complexity, Alphabet Indexing, Local Search Algorithm, NP-Complete, PLS-Complete 1 Introduction Machine learning methods have been developed in [1, 2] to discover bioinformatical knowledge from amino acid sequences of proteins which are compiled together with their functional information in databases such as PIR [13]. The learning algorithm in [1] uses elemen..
    corecore